Detecting simultaneous changepoints in multiple sequences.
نویسندگان
چکیده
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
منابع مشابه
AN EMPIRICAL BAYESIAN ANALYSIS OF SIMULTANEOUS CHANGEPOINTS IN MULTIPLE DATA SEQUENCES By
Motivated by applications in genomics, finance, and biomolecular simulation, we introduce a Bayesian framework for modeling changepoints that tend to co-occur across multiple related data sequences. We infer the locations and sequence memberships of changepoints in our hierarchical model by developing efficient Markov chain Monte Carlo sampling and posterior mode finding algorithms based on dyn...
متن کاملFinite-State Markov Chains for Multiple Sequences
We consider the analysis of sets of categorical sequences consisting of piecewise homogeneous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a ...
متن کاملOptimal detection of changepoints with a linear computational cost
We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost func...
متن کاملA changepoint analysis of spatio-temporal point processes
This work introduces a Bayesian approach to detecting multiple unknown changepoints over time in the inhomogeneous intensity of a spatio-temporal point process with spatial and temporal dependence within segments. We propose a new method for detecting changes by fitting a spatio-temporal log-Gaussian Cox process model using the computational efficiency and flexibility of integrated nested Lapla...
متن کاملSequential Gaussian Process Prediction in the Presence of Changepoints or Faults
We introduce a new sequential algorithm for making robust predictions in the presence of changepoints. Unlike many previous approaches [1], which focus on the problem of detecting and locating changepoints, our algorithm focuses on the problem of making predictions even when such changes might be present. We introduce nonstationary covariance functions to be used in Gaussian process prediction ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Biometrika
دوره 97 3 شماره
صفحات -
تاریخ انتشار 2010